Semi-supervised Semantic Role Labeling via Graph-Alignment
نویسندگان
چکیده
Semantic roles, which constitute a shallow form of meaning representation, have attracted increasing interest in recent years. Various applications have been shown to benefit from this level of semantic analysis, and a large number of publications has addressed the problem of semantic role labeling, i.e., the task of automatically identifying semantic roles in arbitrary sentences. A major limiting factor for these approaches, however, is the need for large manually labeled semantic resources to train semantic role labeling systems in the supervised learning paradigm. Consequently, the application of such systems is still limited to the small number of languages and domains for which sufficiently large semantic resources are available. This thesis addresses the knowledge acquisition problem of semantic role labeling, i.e., the substantial annotation effort required for the creation of semantic resources that can be used to train state-of-the-art semantic role labeling systems. Our main contribution is to formulate a semi-supervised approach to semantic role labeling, which requires only a small manually labeled corpus of role-annotated sentences. This initial seed corpus is augmented with annotation instances generated automatically from a large unlabeled corpus. The augmented corpus is used as training data for a supervised role labeler, to improve labeling performance over what can be attained when training on the manually labeled sentences alone. Our approach therefore reduces the annotation effort required to attain satisfactory performance and thus alleviates the knowledge acquistion problem, especially for languages and domains where the cost of annotating large semantic resources is prohibitive. The key idea of our semi-supervised approach is to measure the similarity between labeled sentences from the manually annotated resource and sentences from a large unlabeled corpus. Similarity is conceptualized in terms of optimal graph alignments, which are employed to project annotations from labeled to unlabeled sentences. To select a set of novel training instances, similarity is operationalized as a measure of confidence, allowing us to limit the adverse effect of erroneous annotations. The optimization problem is formulated as an integer linear program and solved efficiently. The thesis broadly consists of two parts. In the theoretical part, our semi-supervised approach to semantic role labeling is described in detail.
منابع مشابه
Semi-Supervised Semantic Role Labeling via Structural Alignment
Large-scale annotated corpora are a prerequisite to developing high-performance semantic role labeling systems. Unfortunately, such corpora are expensive to produce, limited in size, and may not be representative. Our work aims to reduce the annotation effort involved in creating resources for semantic role labeling via semi-supervised learning. The key idea of our approach is to find novel ins...
متن کاملGraph Alignment for Semi-Supervised Semantic Role Labeling
Unknown lexical items present a major obstacle to the development of broadcoverage semantic role labeling systems. We address this problem with a semisupervised learning approach which acquires training instances for unseen verbs from an unlabeled corpus. Our method relies on the hypothesis that unknown lexical items will be structurally and semantically similar to known items for which annotat...
متن کاملSemi-Supervised Semantic Role Labeling
Large scale annotated corpora are prerequisite to developing high-performance semantic role labeling systems. Unfortunately, such corpora are expensive to produce, limited in size, and may not be representative. Our work aims to reduce the annotation effort involved in creating resources for semantic role labeling via semi-supervised learning. Our algorithm augments a small number of manually l...
متن کاملA Graph-Based Semi-Supervised Learning for Question Semantic Labeling
We investigate a graph-based semi-supervised learning approach for labeling semantic components of questions such as topic, focus, event, etc., for question understanding task. We focus on graph construction to handle learning with dense/sparse graphs and present Relaxed Linear Neighborhoods method, in which each node is linearly constructed from varying sizes of its neighbors based on the dens...
متن کاملSemi-Supervised Semantic Role Labeling: Approaching from an Unsupervised Perspective
Reducing the reliance of semantic role labeling (SRL) methods on human-annotated data has become an active area of research. However, the prior work has largely focused on either (1) looking into ways to improve supervised SRL systems by producing surrogate annotated data and reducing sparsity of lexical features or (2) considering completely unsupervised semantic role induction settings. In th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011